Scalable, Balanced Model-based Clustering
نویسندگان
چکیده
This paper presents a general framework for adapting any generative (model-based) clustering algorithm to provide balanced solutions, i.e., clusters of comparable sizes. Partitional, model-based clustering algorithms are viewed as an iterative two-step optimization process—iterative model re-estimation and sample re-assignment. Instead of a maximum-likelihood (ML) assignment, a balanceconstrained approach is used for the sample assignment step. An efficient iterative bipartitioning heuristic is developed to reduce the computational complexity of this step and make the balanced sample assignment algorithm scalable to large datasets. We demonstrate the superiority of this approach to regular ML clustering on complex data such as arbitraryshape 2-D spatial data, high-dimensional text documents, and EEG time series.
منابع مشابه
Contents I Part A 7 1 Clustering with Balancing Constraints 9
In many applications of clustering, solutions that are balanced, i.e, where the clusters obtained are of comparable sizes, are preferred. This chapter describes several approaches to obtaining balanced clustering results that also scale well to large data sets. First, we describe a general scalable framework for obtaining balanced clustering which first clusters only a small subset of the data ...
متن کاملTesting Several Rival Models Using the Extension of Vuong\'s Test and Quasi Clustering
The two main goals in model selection are firstly introducing an approach to test homogeneity of several rival models and secondly selecting a set of reasonable models or estimating the best rival model to the true one. In this paper we extend Vuong's method for several models to cluster them. Based on the working paper of Katayama $(2008)$, we propose an approach to test whether rival models h...
متن کاملAn Adaptive LEACH-based Clustering Algorithm for Wireless Sensor Networks
LEACH is the most popular clastering algorithm in Wireless Sensor Networks (WSNs). However, it has two main drawbacks, including random selection of cluster heads, and direct communication of cluster heads with the sink. This paper aims to introduce a new centralized cluster-based routing protocol named LEACH-AEC (LEACH with Adaptive Energy Consumption), which guarantees to generate balanced cl...
متن کاملA New WordNet Enriched Content-Collaborative Recommender System
The recommender systems are models that are to predict the potential interests of users among a number of items. These systems are widespread and they have many applications in real-world. These systems are generally based on one of two structural types: collaborative filtering and content filtering. There are some systems which are based on both of them. These systems are named hybrid recommen...
متن کامل